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In  order  to  understand  the  impact  of  channel  errors  on  protocols  such  as  ATM, 
understanding  of  the  error  process  occurring  on  a  channel  is  important.  Not  just 
the  rate  of  errors,  but  also  their  distribution,  will  have  significant  impact  on  how 
well  the  protocol  runs  over  the  channel,  and  on  what  sort  of  error  correction/detec¬ 
tion  protocol  is  required  to  best  overcome  the  disadvantages  of  the  link. 

This  paper  describes  the  special  purpose  Error  Injection  Units  (EIUs)  at  Rome 
Lab  and  summarizes  methods  for  generating  models  for  the  EIUs  which  were  pre¬ 
viously  in  use  at  Rome  Lab.  It  then  goes  on  to  describe  a  set  of  metrics  for  deter¬ 
mining  the  quality  of  an  EIU  model,  and  proposes  other  methods  of  building 
models  for  the  EIU.  Finally,  it  describes  the  process  to  take  in  order  to  use  and 
evaluate  these  models,  in  support  of  the  Joint  Advanced  Demonstration  Environ¬ 
ment’s  (JADE)  project  at  Rome  Lab  and  other  sites  around  the  nation. 
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1  Introduction 


A  tactical  environment  necessitates  the  use  of  “disadvantaged”  links  —  links  with 
relatively  low  bandwidth  and  a  relatively  large  number  of  errors.  Furthermore,  in 
order  to  provide  seamless  integration  in  a  tactical  environment,  and  thus  provide 
the  best  possible  information  services  to  the  warrior,  some  sort  of  standard  proto¬ 
col,  such  as  Asynchronous  Transfer  Mode  (ATM)  is  desirable.  In  order  to  under¬ 
stand  the  impact  of  channel  errors  on  protocols  such  as  ATM,  understanding  of  the 
error  process  occurring  on  a  channel  is  important.  Not  just  the  rate  of  errors,  but 
also  their  distribution,  will  have  significant  impact  on  how  well  the  protocol  runs 
over  the  channel,  and  on  what  sort  of  error  correction/detection  protocol  is 
required  to  best  overcome  the  disadvantages  of  the  link. 

This  paper  describes  the  special  purpose  Error  Injection  Units  (EIUs)  at  Rome 
Lab  and  summarizes  previously  developed  methods  of  generating  models  for  the 
EIUs  from  actual  data.  It  then  goes  on  to  describe  a  set  of  metrics  for  determining 
the  quality  of  an  EIU  model.  It  proposes  other  methods  of  building  models  for  the 
EIU.  Finally,  it  describes  procedures  for  evaluating  these  methods,  both  by  use  of 
the  metrics  and  through  comparison  of  results  with  the  actual  protocols  under  test. 

The  remainder  of  this  paper  is  organized  as  follows.  Section  2  is  the  background 
material,  including  an  overview  of  the  operation  of  the  EIUs,  and  early  work  done 
in  deriving  models  based  on  data.  Section  3  proposes  a  set  of  metrics,  based  both 
on  the  strengths  of  the  earlier  methods  and  on  its  perceived  weaknesses.  Section  4 
presents  two  alternate  methods  of  model  building.  The  first  is  an  adaptation  of  a 
model  from  the  literature.  The  second,  which  is  still  under  development,  is 
designed  to  satisfy  one  of  the  proposed  metrics.  Section  5  describes  an  ideal  use 
of  these  models,  from  data  collection  through  model  selection.  Section  6  con- 


2 


eludes  the  paper,  and  Section  7  is  a  bibliography.  Finally,  Section  8  acknowledges 
those  who  had  a  part  in  this  project. 

2  Background 

2.1  The  Error  Injection  Units 

The  error  injection  units  (EIUs)  were  designed  and  built  for  Rome  Lab  in  the  early 
90’s.  Their  purpose  is  to  emulate  a  disadvantaged  link  so  that  system  level  tests 
could  be  conducted  without  committing  the  resources  of  an  actual  link  of  the 
desired  type.  The  EIUs  take  as  input  a  digital  signal  at  up  to  T1  speed,  and  intro¬ 
duce  (or  inject)  errors  into  the  stream  according  to  a  configurable,  8  state  Markov 
chain.  The  EIUs  are  configured  with  a  state  transition  matrix;  additionally,  each 
state  can  be  configured  to  be  either  an  error  state  or  an  error  free  state. 

Each  EIU  is  equipped  with  a  pseudo  random  number  generator,  and  maintains 
state  information.  During  every  bit  time,  the  EIU: 

•  Generates  a  random  number 

•  Determines  what  state  to  transition  into,  and  makes  that  transition 

•  Outputs  a  bit  based  on  the  incoming  bit,  and  the  error  status  of  the  current  state 

With  appropriate  use  of  the  states,  it  is  easy  to  create  many  of  the  ‘standard’  error 
models,  including  Bernoulli  errors  (independent,  identical),  ‘bursty’  errors,  and 
much  more  general  types  of  error  processes. 
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2.2  Error  Gap  Modelling 

In  this  section,  we  present  the  method  of  model  building  previously  practiced  at 
Rome  Lab,  as  well  as  earlier  models  which  are  presented  here  for  completeness. 

2.2.1  The  Bernoulli  Model 

The  simplest  error  model  is  that  which  is  implicit  made  by  a  variety  of  analytical 
equipment,  notably,  the  standard  Bit  Error  Rate  Tester  (BERT).  This  model  has 
one  parameter  —  the  average  rate  at  which  errors  occur.  The  error  status  of  each  bit 
is  assumed  to  be  independent  of  the  error  status  of  each  other  bit.  While  this 
model  is  attractive  in  its  simplicity,  it  tends  to  have  poor  predictive  power  over 
satellite  and  other  radio  links,  due  to  time-persistent  effects  such  as  channel  fad¬ 
ing.  These  effects  tend  to  cause  errors  to  occur  in  groups,  commonly  called  bursts. 
This  leads  us  to  investigate  more  complicated  error  models  which  take  this  into 
account. 

2.2.2  Gilbert’s  Model 

The  Gilbert  error  model  was  first  described  in  [Gilbert60],  as  a  description  of 
errors  on  a  telephone  channel.  Gilbert  assumed  that  errors  occurred  in  bursts  of 
exponentially  distributed  length,  and  that  error-free  runs  were  also  exponentially 
distributed  in  length.  Furthermore,  he  assumed  that  during  an  error  burst,  each  bit 
was  in  error  independently  with  some  probability  1  -  h .  This  model  is  described 
more  carefully  in  Section  4. 
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2.2.3  Tsai’s  Model 


The  previous  work  done  in  generating  models  (i.e.,  transition  matrices)  for  the 
EIU  was  primarily  done  by  Dr.  Wayne  Smith  of  Mississippi  State  University.  He 
assumed  that  the  errors  arrive  in  a  distribution  which  is  nearly,  but  not  exactly, 
Poisson.  This  leads  to  the  approximation  of  error  gap  distribution  as  the  sum  of 
exponentials.  The  resulting  model  has  exactly  one  error  state,  with  transitions  to 
and  from  each  of  the  non-error  states,  which  do  not  interact  with  one  another.  This 
model  corresponds  closely  to  those  discussed  in  [Tsai69]  and  [McManamon70]. 

The  procedure  which  results  for  building  a  model  from  data  is  relatively  straight¬ 
forward.  Since,  given  the  assumption  of  one  error  state,  the  model  is  completely 
determined  by  the  error  gap  distribution,  one  should  simply  build  a  histogram  of 
error-free-run  lengths;  fit  the  histogram  as  well  as  possible  as  a  sum  of  exponen¬ 
tials;  and  then  compute  the  appropriate  transition  probabilities,  so  that  the  error- 
free-run  distribution  will  match  the  fitted  curve.  Remarks  on  effective  curve  fitting 
can  be  found  in  [Smith93]. 

To  restate  the  result,  if  we  have  an  approximation  to  the  error  gap  distribution 

n  - 1 

(that  is,  /  such  that  P  [0  1 J 1]  =  /(m)  =  > Ate  ),  and  we  assume  that  states  1 

i  =  1 

through  n  -  1  are  error-free  states  and  state  n  is  the  error  state,  then  the  non-zero 
transitions  can  be  given  as 


pu  =  e  ' ,  for  i<n 

(EQ  1) 

Pm  =  Aie’>  for  i<n 

(EQ  2) 

Pm  =  1  -  Pu,  for  i<n 

(EQ  3) 
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(EQ4) 


n  - 1 

P nn  ~  ^  —  ^^jP ni 
i=  1 

It  is  relatively  straightforward  to  verify  that  this  set  of  transitions  produces  the 
desired  distribution  of  error  free  runs. 


3  Metrics 

There  are  two  primary  methods  to  evaluate  the  quality  of  a  model.  The  best 
method  is  to  compare  the  performance  of  the  protocol  or  application  of  interest 
when  run  over  the  actual  channel,  with  the  performance  that  is  predicted  by  the 
model.  However,  this  is  not  always  possible  or  practical.  Therefore,  metrics  must 
be  available  to  compare  the  distribution  of  errors  on  the  actual  channel  to  the  dis¬ 
tribution  of  errors  produced  by  the  model.  In  this  section,  several  such  metrics 
will  be  introduced. 

The  first  is  suggested  quite  directly  from  the  model  above.  If  we  believe  that  the 
burst-length  distribution  and/or  the  error-ffee-gap  distribution  are  important  char¬ 
acteristics  of  the  error  process  (as  is  no  doubt  the  case),  then  these  suggest  the  first 
of  our  measures:  goodness  of  fit  of  error  burst  distribution,  and  goodness  of  fit  of 
error  gap  distribution.  There  are  a  number  of  ways  we  could  evaluate  the  “good¬ 
ness”  of  fit  between  these  two  functions  described  in  [Smith93],  Briefly,  Smith 
proposes  use  of  the  sample  standard  deviation,  the  correlation  coefficient,  or  the 
plot  of  the  residuals. 

The  error  burst  distribution  length  can  be  generalized  by  using  the  parameter  A ,  as 
described  in  [Tsai69],  Here,  a  burst  does  not  necessarily  consist  of  all  errors,  but 
begins  and  ends  with  an  error,  and  has  a  density  of  errors  exceeding  A .  We  see 
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that  the  previous  definition  of  an  error  burst  is  just  a  special  case  of  the  generaliza¬ 
tion  with  A  =  1 .  Clearly,  however,  we  can  compare  the  gap  distributions  for  dif¬ 
ferent  values  of  A ,  and  thereby  obtain  additional  metrics. 

The  third  metric  is  motivated  by  considering  the  error  sequence  as  a  random  pro¬ 
cess.  Then  it  is  natural  to  ask  about  the  second  order  statistics  of  this  random  pro¬ 
cess;  in  particular,  the  autocorrelation  function.  Assuming  that  the  process  is 
wide-sense  stationary  (WSS),  and  that  we  represent  the  sequence  of  errors  as 
e0,  gj, ... ,  the  autocovariance  function  is  given  by 

Rk  =  E[eiei+k]  (EQ5) 

Since  the  random  process  is  binary,  it  is  clear  that  Rk>  is  exactly  the  probability 
that  two  bits  at  distance  k  are  simultaneously  1  (in  error).  In  particular,  note  that 
R0  =  E \e:e{]  ,  the  correlation  of  one  term,  is  equal  to  the  Bit  Error  Rate  (BER) 

(again,  because  the  rp  is  binary).  Also,  lim  Rk  =  (R0) 2 ,  since,  in  the  extreme  long 

*-+  — 

term,  bits  are  in  error  independently. 

In  any  case,  the  goodness  of  fit  of  the  autocorrelation  function  is  also  a  natural 
metric  to  use.  Once  again,  we  can  use  any  of  the  methods  proposed  by  [Smith93] 
to  measure  the  goodness  of  fit. 

Finally,  a  survey  of  recent  literature  provides  another  model  and,  with  it,  another 
metric.  It  is  natural  to  consider  the  probability  that  a  given  model  produced  the 
observed  error  sequence.  A  model  which  has  a  relatively  higher  probability  of 
producing  a  given  sequence  is,  intuitively,  more  likely  to  be  the  “Hidden  Markov 
Model”  producing  that  sequence.  (The  Markov  Model  is  hidden  because  we 
observe  the  error  sequence,  and  not  the  sequence  of  states  through  which  the 
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model  passes.  Since  there  are,  in  general,  multiple  error-producing  states  and  mul¬ 
tiple  error-free  states,  we  may  never  be  certain  what  state  the  model  is  in.)  Note 
that  direct  computation  of  this  probability  is  computationally  prohibitive;  fortu¬ 
nately,  there  is  a  well  known  method  for  computing  it  that  is  much  more  efficient. 
This  will  be  discussed  in  Section  4.3;  the  method  for  optimizing  this  method  is 
also  described  in  [Poritz88],[Turin93],  and  [Morgera91], 


4  Derivation  of  Models 

In  this  section,  several  methods  of  deriving  models  from  data  are  presented.  Note 
that  in  addition  to  those  presented  in  this  section,  Tsai’s  model  was  presented  in 
Section  2.2.3. 

4.1  The  Gilbert  Model 

The  Gilbert  Model  [Gilbert60]  is  a  two  state  model,  where  the  states  represent 
‘good’  and  ‘bad’  line  conditions.  (See  Figure  1).  In  the  good  state  G,  the  model 
produces  no  errors.  In  the  bad  state  B,  the  model  produces  an  error  with  probabil¬ 
ity  1  -  h . 


Figure  1.  Gilbert’s  model  state  diagram 
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Now,  it  should  be  mentioned  that  this  model  cannot  be  directly  encoded  in  an  EIU 
model,  because,  due  to  the  hardware  design,  the  EIUs  states  must  either  always 
produce  an  error,  or  never  produce  an  error;  they  cannot,  like  state  B,  produce  an 
error  with  some  fixed  probability.  However,  it  is  relatively  straightforward  to  split 
B  into  two  states,  one  producing  an  error,  and  the  other  error  free.  This  is  shown  in 
[Turin93];  see  Figure  2. 

qh 

q(l-h) 

q(l-h) 

Figure  2.  Three  state  EIU  equivalent  to  Gilbert’s  Model.  B1  is  the 
only  error  producing  state 

Gilbert  then  observes  that  there  are  three  independent  parameters,  since 
P  =  1  -Q,  and  p  =  1  -q.  Therefore,  three  measured  quantities  are  needed  to 
estimate  them.  He  chooses  the  observations 


a  =  P(  1) 
b  =  J>(1|1) 
E(1U) 

P(101)  +P(111) 


(EQ  6) 
(EQ  7) 

(EQ  8) 


Clearly,  a  can  be  measured  simply  by  dividing  the  number  of  errors  by  the  num¬ 
ber  of  bits,  b  can  be  measured  by  dividing  the  number  of  occurrences  of  the  bit 
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string  ‘11  ’  by  the  number  of  errors,  and  c  can  be  measured  by  counting  the  num¬ 
ber  of  occurrences  of  the  bit  strings  ‘101’  and  ‘111’,  and  performing  the  appropri¬ 
ate  operations.  Note  that  c  is  the  probability  of  a  1  occurring  in  the  center  position 
when  two  ones  occur  one  space  apart. 

It  is  easy  to  find  equations  for  a,  b,  and  c  in  terms  of  P ,  p ,  and  h .  Then,  solving 
these  equations  for  the  variables  on  the  right,  we  get 


ac  -  b 2 

^  2ac-b(a  +  c) 

(EQ  9) 

b=l-b- 

(EQ  10) 

Q 

,  _  ap 

\-h-a 

(EQ  11) 

It  is  interesting  to  note  that,  as  Gilbert  observed,  these  equations  tend  to  produce 
ridiculous  values  for  the  parameters  if  not  enough  data  is  presented.  Our  experi¬ 
ence  indicates  that  they  may  also  produce  such  out  of  range  values  if  the  observed 
error  process  does  not  fit  the  Gilbert  model. 

4.2  The  Autocorrelation  Matching  Model 

The  weakness  of  Tsai’s  method  is  that  it  assumes  that  adjacent  error  gaps  are  dis¬ 
tributed  independently  of  one  another.  It  is  easy  to  see  this  from  the  way  the 
model  is  constructed  -  after  each  error  burst,  the  model  selects  a  non-error  state 
independently  of  its  last  non-error  state,  and  then  tarries  in  that  state  until  the  next 
error  burst.  Also,  notice  that  it  is  built  entirely  from  the  probability  mass  function 
of  the  error  gaps.  No  information  about  the  placement  or  ordering  of  the  gap  sizes 
is  included  in  the  models  derivation. 
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In  order  to  try  to  include  this  sort  of  information,  we  can  try  to  fit  the  autocorrela¬ 
tion  function  of  the  data,  instead  of  just  the  gap  distribution.  The  autocorrelation 
function  gives  the  joint  probability  of  two  bits  being  in  error,  regardless  of 
whether  the  intervening  bits  are  in  error  or  not,  while  the  error  gap  characteriza¬ 
tion  gives  the  joint  probability  of  two  bits  being  in  error,  assuming  that  the  inter¬ 
vening  bits  are  error-free. 

In  this  section,  we  will  derive  expressions  for  the  autocorrelation  function,  and 
describe  briefly  the  difficulties  with  building  a  probability  transition  function  with 
the  desired  autocorrelation  function. 

Let  P  be  the  probability  transition  matrix  for  a  Markov  chain,  such  that  if  tc,  is  a 
vector  of  probabilities  of  being  in  each  state  at  time  t,  then  nl+ ,  =  Pnt  is  a  vector 
of  the  probabilities  of  being  in  each  state  at  time  /  +  1 .  There  exists  a  vector  ji0 
such  that  Jt0  =  Pk0 ;  this  is  the  long-term  steady  state  probability  of  being  in  each 
state.  Now,  if  we  let  £  be  a  diagonal  matrix  with  1  in  the  diagonal  entries  corre¬ 
sponding  to  error  states  and  0  in  the  other  diagonal  entries,  then  En0  is  the  long¬ 
term  steady  state  probability  vector,  with  ail  non-error  entries  zeroed.  Clearly,  the 
;  th  entry  of  P"En()  is  the  probability  of  seeing  an  error  string  lx",  ending  in  state 
i .  Then  EPnEn0  is  the  same  vector  with  all  non-error  ending  states  zeroed.  Finally, 
if  we  let  I  be  a  column  vector  of  l’s,  then  iTEPnEn0  is  the  probability  of  seeing 
the  string  lx"  “ 1 1 ,  or  Rn ;  thus 

Rn  =  lTEPnEn0  (EQ  12) 


11 


However,  this  form  of  the  autocorrelation  function  is  not  entirely  satisfying. 
Therefore,  we  continue  to  simplify.  Recall  that  any  matrix  A  can  be  rewritten 

A  =  SAS~l ,  where  A  is  a  diagonal  matrix  whose  entries  are  the  eigenvalues  of  A , 
and  5  is  a  matrix  containing  the  eigenvectors  of  A ,  one  in  each  column;  that  is,  if 
Xv  X2, ...,  Xm  are  the  eigenvalues  of  A  and  v1(v2, ...,  vm  are  the  corresponding 


0  . 

..  0 

eigenvectors,  then  A  = 

0 

X2  . 

..  0 

0 

0  . 

..  X 

m 

and  S  =  fvj  v2  ...  vm]  .  Using  this  decom 


position,  let  P  =  SAS~l ;  then 


0  . 

..  0 

Rn  =  lTE(SAS-l)"En0  =  VeSA^S-'EtIq  =  1 TES 

0 

Xj  .. 

..  0 

0 

0  ., 

••  K 

S-'Ertg  (EQ  13) 


It  is  clear  that  the  form  of  this  equation  will  be 


Rn  =  =  2*,enlog*'  (EQ  14) 

;=  i 

This  form  is  familiar;  it  is  the  same  as  that  we  dealt  with  in  fitting  the  gap  distribu¬ 
tions.  Thus,  we  have  only  to  measure  the  autocorrelation  function,  fit  it  with  m 
exponentials,  and  then  determine  matrices  E,  S,  and  A  to  give  us  the  desired 
function  Rn .  Note  that  A  is  determined  immediately  from  the  coefficients  in  the 
exponents.  This  leaves  only  the  problem  of  determining  E,  which  indicates  which 
states  are  error  states,  and  S,  the  eigenvector  matrix  of  P. 
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In  fact,  it  would  appear  that  the  solution  is  not  unique;  further,  no  closed  form  for 
a  solution  has  been  located.  Exploring  the  method  further,  however,  let  us  suppose 
that  we  set  the  number  of  error  states,  Ne,  a  priori.  Then  E  can  be  set  so  that  the 
first  elements  of  the  diagonal  are  one,  and  later  entries  are  zero,  depending  on  the 
number  of  error  states.  (It  does  not  matter  which  state  is  which,  so  there  is  no  loss 
of  generality  in  designating  the  first  Ne  states  to  be  error  states.)  Using  this  further 
information,  we  can  develop  some  information  about  the  necessary  form  of  the 
eigenvector  matrix. 

Note  that  any  probability  transition  matrix  has  columns  whose  entries  sum  to  1. 
As  a  result,  when  the  transition  matrix  multiplies  any  vector  (whose  entries  sum  to 
some  value  je),  the  resulting  vector  also  has  entries  summing  to  x.  Furthermore, 
the  long-term  state  occupancy  probability  vector  (7t0)  is  clearly  a  fixed  point  with 
respect  to  multiplication  by  the  transition  matrix.  This  is  the  same  as  saying  that  it 
is  an  eigenvector  corresponding  to  the  eigenvalue  1.  It  is  relatively  easy  to  see  that 
all  the  other  eigenvectors  must  have  elements  which  sum  to  0;  if  they  added  to 
one,  there  would  be  two  long-term  state  occupancy  probability  vectors  (which,  of 
course,  can  happen,  but  only  if  the  Markov  chain  is  disconnected,  which  we  do 
not  wish  to  consider).  If  the  sum  of  elements  was  neither  zero  nor  one,  then  there 
would  be  some  vectors  whose  sum  was  not  conserved  during  a  multiplication  by 
the  transition  matrix. 

Now,  let  us  first  consider  the  term  S~lEn0.  Note  that  SS~lEi t0  =  En0,  so  we  can 

m 

interpret  the  vector  S~lEn0  as  a  set  of  coefficients  such  that  ^divj  =  En0.  Here, 

i  =  1 

we  take  advantage  of  the  fact  that  the  eigenvectors  vf  can  be  scaled  however  we 
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like.  We  scale  them  in  such  a  way  that  dt  =  1 ,  for  i  >  1 .  Extending  this  to  the  vec- 


d{k » 

tor  AnS~1En0,  we  see  that  AnS~lEn0  =  ^2 

dmX" 

m  m 


m 

Rn  =  d.BER  +  ^X^y.j  (EQ  15) 

i  =  2  j  =  1 

From  the  initial  description  of  the  autocorrelation  function,  then,  we  know  that 
dx  =  HER .  To  summarize  our  knowledge  about  S,  then,  we  observe  that  Vj  =  k0  , 

m  n,  N, 

(BER)  Vj  +  ^  v;  =  Evp  Xvu-  =  bER,  and  ^  j  =  (for  i>  2),  where  the 
i  =  2  i  =  1  y  =  1 

values  of  E  come  from  the  curvefit.  Clearly,  this  leaves  open  many  degrees  of 
freedom  on  the  matrix  S .  These  degrees  are  needed  to  insure  that  the  entries  in  P 
are  positive.  The  problem  of  finding  a  matrix  S  which  meets  this  requirement  has 
not  been  solved;  we  have  attempted  to  use  an  “annealing”  algorithm  of  sorts  to 
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stochastically  search  for  such  a  matrix.  However,  it  has  proved  resistant  to  these 
techniques  and  other  methods  are  currently  under  investigation.  The  details  are 
not  presented  in  this  paper. 

4.3  The  Maximum  Likelihood  Model  (Baum-Welch 
Algorithm) 

Clearly,  the  improvement  gained  in  moving  to  the  autocorrelation  method  is 
somewhat  counteracted  by  the  fact  that  its  solution  is  not  well  understood.  There¬ 
fore,  a  more  refined  method  is  desirable.  The  maximum  likelihood  model  is  such  a 
method.  The  following  presentation  is  a  mixture  of  those  found  in  [Turin93], 
[Morgera91],  and  [Poritz88], 

First,  let  us  suppose  that  we  have  a  binary  observation  vector  (that  is,  for,  exam¬ 
ple,  the  error  bit  vector  from  a  measurement  of  the  channel),  call  it  E  =  e]e2...eT. 
Then  we  wish  to  find  a  Markov  model  A  which  maximizes  the  expression 
Pr  [£|A]  .  In  other  words,  what  we  seek  is  the  Markov  model  which  has  the  high¬ 
est  probability  of  producing  the  observed  error  sequence.  Formally,  we  consider 
A  =  (A,  B,  n) ,  where  A  =  [atJ] ,  B  =  [btj\  are  matrices  which  describe  the 
Markov  model  and  tc  is  probability  vector  describing  the  likelihood  of  having 
started  in  a  particular  state;  at]  is  the  probability  of  a  transition  from  state  i  to  state 
j ,  and  btJ  is  the  probability  of  outputting  symbol  j ,  given  that  the  system  is  in  state 
i.  We  have  already  seen  this  type  of  state  model  in  the  Gilbert  model;  however, 
for  the  EIU,  we  will  require  that  all  entries  btj  are  either  0  or  1. 

The  general  approach  to  solving  this  method  in  the  literature  is  simple;  guess  at  A , 
compute  Pr  [£|A]  ,  and  then  use  a  gradient  ascent  method  to  maximize  that  quan- 
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tity.  While  this  is  quite  straightforward,  there  are  hidden  difficulties.  For  example, 
the  direct  method  for  computation  of  FrtLIA]  is  exponentially  complex  in  the 
length  of  the  error  observation.  Fortunately,  there  is  a  simpler  way  of  computing 
this  quantity,  namely  the  forward-backward  recursion.  Before  presenting  this, 
however,  we  will  need  a  little  notation. 

\k  o  -  o' 

Let  B  ( vk )  =  ®  •••  0  Let  p  (vt) .  Then  we  can  define  the  for- 

,°  0  -  V 

ward  variable  a,  (j)  =  Pr  [ev  e2, ...,  ep  it  =  ^;|X]  .  We  usually  speak  of  the  vector 

a,  =  [a,(l)  ...  a,(Ao]  •  Similarly,  we  can  define  the  backward  variable 

P  ,(i)  =  Pr[el  +  V  el+2, ...,  eT\it  =  qi,  X]  ,  with  p,  defined  as  a  column  vector  in  the 
obvious  manner.  Then  it  is  fairly  easy  to  show  that 


a0  =  n  (EQ  16) 

a,  +  i  =  a,F(e,+  1)  (EQ  17) 

Pr  =  *  (EQ  18) 

P,  =  ^,  +  1)P,+  i  (EQ  19) 

Furthermore,  it  is  clear  that,  for  any  t , 

Pr  [£|X]  =  a,p,  (EQ  20) 


Now,  we  can  present  one  formulation  of  the  Baum-Welch  reestimation  equations. 
3A 

Define  U-  =  (that  is,  the  matrix  that  is  0  everywhere  except  in  position  ij , 
where  it  is  1).  Then  given  X  =  (A,  B,  n) ,  we  reestimate  using: 


16 


(EQ  21) 


aiiEa'ud,i(e‘+i)V,+i 


T-  1 


I  “,w 


V  ~  T 

Xa»y»P' 

J=  1 

-  “0  (0  Po  (0 

n‘  ~  Pr[E\X ] 


(EQ  22) 


(EQ  23) 


Notice  the  computational  complexity  -  linear  in  the  length  of  the  observation 
sequence.  Since  we  expect  to  be  working  with  very  long  observation  sequences,  it 
is  worth  our  while  to  investigate  the  suggestions  made  in  [Turin93]  to  speed  up 
the  computation  by  taking  advantage  of  long  sequences  of  repeated  symbols.  Let 
us  suppose  that  we  can  break  up  our  error  observation  string  into  segments,  each 
of  which  consist  of  a  single  symbol,  repeated  one  or  more  times; 

i 

E  =  ele2...eT  =  x['Xl2\..Xlx\  For  convenience,  we  also  define  tt  =  to  be  the 

i  =  \ 


boundary  times  between  the  runs  of  symbols. 


Now,  using  this  extra  information,  we  can  rewrite  the  forward-backward  recur¬ 
sions  in  a  more  efficient  way,  remembering  that  matrix  powers  can  be  computed 

i 

|  efficiently  by  any  number  of  methods: 


(EQ  24) 


a, 


=  a.P 


h  +  i 


<■ e‘,J 


(EQ  25) 
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p«.  =  l 

h=pl,"^rjK> 


(EQ  26) 


(EQ  27) 

Notice  that  the  same  K  matrices  (K  is  the  number  of  symbols  in  the  alphabet;  in 
the  case  of  error  sequences,  it  is  always  2)  are  being  raised  to  various  powers.  For 
this  reason,  it  is  probably  preferable  to  compute  the  matrix  powers  using  an  eigen 
decomposition,  because  the  expense  of  computing  the  eigenvalues  and  eigenvec¬ 
tors  can  be  easily  payed  for  by  the  vastly  simplified  computations  that  result. 

However,  notice  that,  in  the  reestimation  formula,  we  take  the  sums  over  t  of  a 
and  p .  Unless  we  can  somehow  find  a  faster  way  to  compute  these  sums,  we  can¬ 
not  take  advantage  of  the  above  computations,  which  skip  intermediate  values. 
Fortunately,  [Turin93]  continues  on  to  solve  this  problem  as  well.  Consider,  for 
example,  the  numerator  of  the  reestimation  equation  for  atj : 

t-  i  T 

°v  =  X  a'UVbi  Pf  1  =  X  a7*  (EQ  28> 

'* o  k=\ 

<V  =  X  (EQ  29) 

'  =  '* 

cuxt  Ok)  =  X  py(X0  uubj  (x0  p‘*~y~ 1  {Xk)  (EQ  30) 

Y  =  0 

We  notice  that  CjjX*  (lk)  is  in  the  form  of  a  matrix  convoludon,  and  thus  take  the  z- 
transform: 

®ijxt  (z)  =  Z(CijXt(l) )  =  £  z~‘c.ixt  (0  (EQ  31) 

/  =  i 
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(EQ  32) 


-  /-i 

4>i]Xk(z)  =  u^iXjpt-y-'iX,) 

f  =  1  y  =  0 

«&..Xt  (z)  =  X  X  [z~lp  (X*}  1  (X*> z_1  [z~lp  (X*}  1  ,_T' 1  (EQ  33) 

Y  =  0/  =  Y+  1 

Now,  if  we  let  8  =  /  -  y-  1  -  then  we  can  say: 

f  -  \  (  -  > 

OijX  (z)  =  Y  [z-'f  (X,)  ]  t  Uijb]  <Xt)  z-1  X  [z-‘P  (Xt)  ] 6  (EQ  34) 

\y=0  '  ^8  =  0  ' 

<S>ijXk(z)  =  [z/-/>  (Xt) ]  -lUtJbj (Xk) z[zI-P (Xk)  ]  -1  (EQ  35) 

Here,  we  must  pause  to  describe  a  decomposition  that  allows  us  to  better  describe 
this  z-transform  -  in  particular,  a  way  of  rewriting  [zI-P(Xk)]~l .  Let  us  first 
focus  on  the  matrix  P  (Xk) .  If  we  let  Xki,  Xk2, \N  be  the  eigenvalues  of  P  (Xk) , 

Xtl  0  ...  0 

with  Ak  =  0  X*2  -  0  ,  and  Sk  =  [Vjll  vk2  ...  vtN]  be  the  matrix  whose  col- 

0  0  ...  XkN^ 

umns  are  the  eigenvectors  of  P  (Xk) ,  then  it  is  well  known  that 

P  (Xk)  =  SkAkSk}  (EQ  36) 

N 

If  we  then  rewrite  Ak  =  ^  Ww  we  see  *at 

i  =  i 

N 

P(Xk)  =  A,,  where  (EQ  37) 
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(EQ  38) 


Dki  =  SkUuS ? 

N 

Also,  since  ( P(Xk))a  =  ,  it  is  easy  to  see  that  (P(Xk))a  =  ^X^Dki. 

i  =  1 

However,  we  now  turn  our  attention  to  the  matrix  zI-P{Xk)  .  Notice  that 

N  N  N 

zl  =  zSIS ^  zS U  J-1,  while  P(Xk)  =  ^  Xkpki  =  £  Xk^U^~l ;  thus 

«  =  i  ;  =  i  i  =  l 


z/-P(Xt)  =  2  (z-Xti)SUitS-'  =  X  (z-XjZ)t,  (EQ  39) 

i  »  1  i  =  1 

In  other  words,  we  can  treat  zI-P  (Xk)  just  as  P  (Xk) ,  except  with  eigenvalues 
z-Xkl,z-  Xk2,  ...,z-  XkN .  Hence,  it  is  clear  that 

N  N 

W-P(X,)]- 1  =  X  =  X2-r|-  (EQ40) 

1  =  1  t  =  1 

Now,  returning  to  our  characterization  of  <$>ijX  (z)  ,  and  substituting  in  the  above  expres¬ 
sion,  we  see  that 

N  N 

°«.(z)  -  X Ifl>w».)^(,.iit)Vv))  (EQ41) 

E= 1£= 1 

Now,  since  the  z-transform  is  a  linear  operator,  we  need  only  find  the  inverse  z- 
transforms  of  each  element  of  the  sum  in  order  to  recover  CijX  (lk)  from  <bijX  (z) . 


_ z _ 

(z-X)  (z-p) 


V-\il 
X- 


if 


(EQ  42) 


z 

Q-X)2 


(EQ  43) 
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Therefore,  it  is  easy  to  see  that 


N  N 


<W'*>  =  IIW/W¥-'((T 


e=  i;= 1 


hkE)  (z  -  X^) 


where  (EQ  44) 


z1 


((z-Xte)  (z-Xt?)) 
K(z-XteMz-Xtc)) 


XA  -  x^- 


,  xie  -  X^ 


,  if  XiE  *■  ,  or 


=  t  Xf*  1 ,  if  X.f  =  X, 


k e  _ 


(EQ  45) 

(EQ  46) 


This  allows  us  to  calculate  the  desired  quantities  in  a  much  more  efficient  manner. 


Finally,  one  remaining  implementation  issue  needs  to  be  dealt  with.  Because  the 
computations  involve  the  exponentiation  of  stable  matrices,  we  must  take  care  to 
avoid  numerical  underflow.  During  the  computations  of  a  and  p ,  we  should  store 
vectors  which  are  normalized  so  that  the  sum  of  their  elements  is  one.  The  scaling 
factors  used  should  also  be  stored,  as  they  will  be  needed  to  successfully  evaluate 
the  other  equations.  A  description  of  this  process  can  be  found  in  [Morgera91]  or 
[Turin93]. 


5  A  Test  Scenario 

In  order  to  explain  how  these  methods  fit  together,  this  section  will  describe  an 
idealized  testing  scenario,  from  beginning  to  end. 

Company  C  wishes  to  characterize  the  performance  of  Application  A  over  a  Satel¬ 
lite  Channel  SC.  Therefore,  Company  C  arranges  time  on  the  channel  SC,  and  sets 
up  and  runs  application  A  over  the  channel  when  they  get  it  In  addition,  they  take 


21 


measurements  of  the  channel  at  a  bit  level,  either  at  the  same  time  as  the  test  is 
running,  or  separately,  perhaps  at  night. 

Now,  as  was  expected,  the  application  A  ran  fairly  poorly.  Therefore,  the  applica¬ 
tion  folks  begin  to  work  on  its  ability  to  work  under  such  adverse  circumstances. 
Meanwhile,  channel  characterization  can  take  place.  First,  each  of  the  available 
models  are  used  on  the  available  data  to  generate  channel  models.  The  faster  mod¬ 
els  are  built  first;  much  of  the  next  step  can  be  done  while  the  more  complicated 
ones  are  being  computed.  Now,  once  the  models  are  available,  the  distribution  of 
error  bursts  and  error  free  bursts  and  their  autocorrelation  functions  can  be  com¬ 
pared  with  the  measured  distributions  and  autocorrelations,  taken  from  the  data 
itself.  Also,  the  probability  that  each  model  generated  the  given  error  strings  can 
be  computed.  (Note  that  these  probabilities  decrease  exponentially  fast  with  the 
length  of  the  error  string.  It  is  not  the  absolute  size  of  these  probabilities,  but  the 
relative  size  which  is  of  interest.)  Now,  using  these  techniques,  it  should  be  possi¬ 
ble  to  identify  a  few  models  which  seem  most  promising.  These  models  (and  per¬ 
haps  some  others,  as  well)  should  reach  the  next  stage  of  testing. 

The  selected  models  should  then  be  loaded  into  an  EIU  (or  other  similar  device). 
The  test  of  application  A  which  was  conducted  earlier  should  be  repeated  over 
each  of  the  modelled  links.  Those  whose  performance  most  closely  match  the  per¬ 
formance  observed  during  the  actual  test  are  to  be  regarded  as  best  for  this  appli¬ 
cation. 

Now,  by  this  time,  the  applications  folks  should  have  some  improvements  in 
place,  creating  a  new  version  of  application  A,  call  it  A’.  But  by  now  they’re  won¬ 
dering  if  they  improved  it  enough  to  cover  for  the  channel  errors,  or  if  they  have 
sandbagged  the  performance  by  providing  too  much  redundancy.  The  channel 
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model(s)  which  have  been  selected  can  be  used  to  evaluate  these  question.  When 
the  revision  performs  well  on  the  modelled  channel,  company  C  can  again  arrange 
time  on  channel  SC,  and  can  test  and  tune  the  parameters  of  the  new  model.  In 
addition,  of  course,  it  may  very  well  be  that  a  new  application,  B,  needs  to  be 
tested.  And  so  here  the  cycle  swings  full  circle  and  the  process  begins  again. 

6  Conclusions 

This  work  has  presented  newer  and  more  sophisticated  methods  for  the  develop¬ 
ment  of  error  channel  models  than  those  which  were  previously  in  use  at  Rome 
Laboratory.  These  methods  have  been  implemented  on  available  computer  sys¬ 
tems,  and  have  been  placed  firmly  within  a  development  cycle  which  includes 
data  collection,  model  building,  and  evaluation  of  the  fitness  of  the  various  mod¬ 
els.  While  further  work  is  still  required  on  the  autocorrelation  matching  method, 
the  other  methods  provide  an  array  of  models  varying  in  complexity  and  fidelity. 
Further  study  is  required  to  predict  the  quality  of  each  method,  and  to  understand 
the  trade-off  between  model  quality  and  computation  time. 
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